1. Import Data

1.1 Data Description

The dataset contains quarterly tourist arrivals from the United States between 1981 and 2012, however, we missed data of the last quarter 2012, so I drop the data of 2012.

  • It includes one time series variable (US arrivals), measured four times per year (quarterly).

  • The data were originally obtained from the fpp2 package and are expressed in thousands of visitors.

library(readr)
library(fpp2)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## ── Attaching packages ────────────────────────────────────────────── fpp2 2.5 ──
## ✔ ggplot2   4.0.0      ✔ fma       2.5   
## ✔ forecast  8.24.0     ✔ expsmooth 2.3
## 
library(TTR)
arrivals_data_csv <- read_csv("arrivals_quarterly.csv")
## Rows: 127 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Quarter
## dbl (4): Japan, NZ, UK, US
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(arrivals_data_csv)
## # A tibble: 6 × 5
##   Quarter Japan    NZ    UK    US
##   <chr>   <dbl> <dbl> <dbl> <dbl>
## 1 1981 Q1 14.8   49.1  45.3  32.3
## 2 1981 Q2  9.32  87.5  19.9  23.7
## 3 1981 Q3 10.2   85.8  24.8  24.5
## 4 1981 Q4 19.5   61.9  52.3  33.4
## 5 1982 Q1 17.1   42.0  53.6  33.5
## 6 1982 Q2 10.6   63.1  34.8  28.4
spec(arrivals_data_csv)
## cols(
##   Quarter = col_character(),
##   Japan = col_double(),
##   NZ = col_double(),
##   UK = col_double(),
##   US = col_double()
## )
arrivals_ts_st <- ts(arrivals_data_csv$US, start=c(1981,1), frequency=4)

# drop the data of 2012 since it only has 3 quarter 
arrivals_ts <- window(arrivals_ts_st, end = c(2011, 4))
head(arrivals_ts)
##        Qtr1   Qtr2   Qtr3   Qtr4
## 1981 32.316 23.721 24.533 33.438
## 1982 33.527 28.366
tail(arrivals_ts)
##         Qtr1    Qtr2    Qtr3    Qtr4
## 2010                 111.615 127.002
## 2011 125.264 101.814 101.925 127.150

2. Data Overview

2.2 Time Series Plot

plot(arrivals_ts)

2.3 Summary of time series plot Observations

The time series of U.S. tourist arrivals (1981–2012) shows several key characteristics:

  • Upward trend: There is a clear long-term increase in tourist arrivals, particularly from the early 1980s through the late 1990s.

  • Strong seasonality: Regular quarterly peaks and troughs indicate recurring seasonal patterns, likely associated with holiday or vacation cycles.

  • Increasing variability: The fluctuations become larger over time, suggesting more volatile tourism demand in later years.

  • Recent flattening: After around 2000, the growth trend appears to slow down or level off, possibly due to economic factors or global events affecting travel.

Overall, the series demonstrates both trend and seasonality

2.4 ACF plot

acf(arrivals_ts)

2.5 Summary of ACF Plot Observations

The Autocorrelation Function (ACF) plot for the U.S. tourist arrivals time series shows the following characteristics:

  • Strong positive autocorrelation at small lags, gradually decreasing as lag increases.
    This pattern confirms the presence of a long-term trend in the data — consecutive observations are highly correlated.
  • Repeating spikes at lag multiples indicate a seasonal component with a quarterly frequency.
    This aligns with the dataset’s quarterly nature and the seasonal peaks observed in the time series plot.
  • The slow decay of autocorrelation over time suggests that the series is non-stationary, meaning it would likely need differencing or trend-seasonal modeling (e.g., Holt-Winters, SARIMA) for forecasting.

Overall, the ACF plot supports the earlier findings that the data exhibits both trend and seasonality

2.6 Central Tendency

# Display summary statistics
summary(arrivals_ts)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   23.72   63.47   85.43   84.15  108.53  136.09
# Create a box plot for visualizing distribution
boxplot(arrivals_ts,
        main = "Boxplot of U.S. Tourist Arrivals (1981–2012)",
        ylab = "Number of Arrivals (in thousands)",
        col = "lightblue",
        border = "darkblue")

# box plot for visulizing distribution by year
years  <- floor(time(arrivals_ts))
values <- as.numeric(arrivals_ts)
df <- data.frame(year = years, value = values)
boxplot(value ~ year, data = df,
        main = "Boxplots by Year",
        xlab = "Year", ylab = "Arrivals (thousands)",
        col = "lightblue", border = "gray40", outline = TRUE, las = 2, cex.axis = 0.7)

2.7 Summary of Statistics and Box Plot Observations

The summary statistics for the U.S. tourist arrivals (1981–2012) are as follows:

Statistic Value
Minimum 23.72
1st Quartile (Q1) 63.95
Median 85.88
Mean 84.85
3rd Quartile (Q3) 108.98
Maximum 136.09

Interpretation:

  • The minimum and maximum values (23.72 to 136.09) indicate a wide range of variation in tourist arrivals across the period.
  • The mean (84.85) and median (85.88) are quite close, suggesting that the data is roughly symmetric with no extreme skewness.
  • The interquartile range (Q3–Q1 = 45.03) shows moderate dispersion, meaning arrivals fluctuated considerably across years.
  • The box plot visually supports these findings: most values fall between 60 and 110 thousand, with a few lower points (in early 1980s) acting as potential mild outliers.
  • Overall, the series exhibits an upward shift in central tendency over time, consistent with the long-term trend observed in the time series plot.

3. Models

3.1 Naive Model

# Model output
naive_forecast <- naive(arrivals_ts,8)
plot(naive_forecast)

The naïve model simply uses the last observed value of the time series as the forecast for all future periods.

  • The shaded region represents the forecast intervals, showing increasing uncertainty over time.

3.1.1 Residual Analysis

naive_residual = residuals(naive_forecast)
plot(naive_residual, main="Residuals from Naïve Forecast", ylab="Residuals", xlab = "Year")
abline(h=0, col="red")

The residuals are centered near zero (no major bias). These patterns indicate that the Naïve method fails to fully account for the seasonal and trend components in the data.

hist(naive_residual, main="Histogram of Naïve Forecast Residuals",
     xlab="Residuals", col="lightblue", border="white")

The residuals are centered near zero, but the distribution isn’t symmetric.
That indicats the Naïve model doesn’t fit the data perfectly — some structure like seasonality is still left in the residuals.

plot(as.numeric(naive_forecast$fitted), as.numeric(naive_forecast$residuals),
     main = "Residuals vs Fitted (Naïve Model)",
     xlab = "fitted values", ylab = "Residuals")
abline(h = 0, col = "red")

Although the residuals are centered around zero, the distribution is not completely symmetric, indicating that the Naïve model does not fully capture the trend or seasonality. There is no clear linear pattern, suggesting that there are no strong systematic errors and the residuals are largely random in value. However, the residuals show considerable fluctuation, implying that the model’s forecasting accuracy is relatively low.

plot(as.numeric(arrivals_ts),as.numeric(naive_forecast$residuals),
     main = "Actual Values vs Residuals (Naïve Model)",
     xlab = "Actual Values", ylab = "Residuals",
     type = "p", pch = 19, col = "darkblue", cex= 0.8)
abline(h = 0, col = "red")

The “Actual vs Residuals” plot conveys a similar interpretation as the “Fitted vs Residuals” plot — the residuals are roughly centered around zero with no strong pattern, but the wide spread suggests the Naïve model does not fully capture the underlying structure of the time series.

Acf(naive_residual, main="ACF of Residuals - Naïve Forecast")

The ACF plot of residuals shows significant autocorrelation at seasonal lags (e.g., 4, 8, 12), indicating that the Naïve model has not fully captured the underlying seasonal pattern in the time series. Therefore, the residuals are not white noise.

accuracy(naive_forecast)
##                     ME     RMSE      MAE         MPE     MAPE     MASE
## Training set 0.7710081 12.47908 9.706016 -0.08056945 11.91185 1.301303
##                    ACF1
## Training set -0.1796656
naive_forecast$mean
##        Qtr1   Qtr2   Qtr3   Qtr4
## 2012 127.15 127.15 127.15 127.15
## 2013 127.15 127.15 127.15 127.15
plot(naive_forecast$mean)

autoplot(naive_forecast) +
  ggtitle("Naïve Forecast for Next 8 Quarters") +
  ylab("Tourist Arrivals (Thousands)") +
  xlab("Year") +
  theme_minimal()

Accuracy

The Naïve model’s accuracy is moderate. The RMSE of 12.48 and MAPE around 11.9% suggest that the model captures general stability in the series but cannot adjust for trend or seasonality.

Interpretation

  • The model predicts a constant level of 127.15 thousand arrivals per quarter for the next year.
  • Since the forecast is flat, it does not reflect the seasonal or cyclical variations present in the actual data.

Conclusion

The Naïve forecast provides a simple benchmark model with easy interpretation but limited predictive capability. It serves as a baseline for comparison against more sophisticated models such as Holt’s Trend or Holt-Winters.

3.2 Simple Moving Averages

# Compute moving averages of different orders
MA3_forecast <- ma(arrivals_ts, order = 3)
MA6_forecast <- ma(arrivals_ts, order = 6)
MA9_forecast <- ma(arrivals_ts, order = 9)

# Plot original time series
plot(arrivals_ts, main = "Simple Moving Averages (Orders 3, 6, 9)",
     ylab = "Tourist Arrivals (Thousands)", xlab = "Year", col = "black")

# Add the three moving averages
lines(MA3_forecast, col = "red", lwd = 2)
lines(MA6_forecast, col = "blue", lwd = 2)
lines(MA9_forecast, col = "green", lwd = 2)

# Forecast next 4 quarters using order 6 (bonus)
fit_ma6 <- auto.arima(na.omit(MA6_forecast))
forecast_ma6 <- forecast(fit_ma6, h = 4)
lines(forecast_ma6$mean, col = "purple", lwd = 2)

legend("topleft",
       legend = c("Original", "MA(3)", "MA(6)", "MA(9)", "Forecast MA(6)"),
       col = c("black", "red", "blue", "green", "purple"),
       lwd = 2, bty = "n")

Observations:

  • As the moving average order increases, the series becomes progressively smoother.
  • The MA(3) line (red) still follows most of the short-term fluctuations.
  • The MA(6) line (blue) reduces seasonal noise and captures medium-term trends better.
  • The MA(9) line (green) is the smoothest, showing the long-term movement but lagging behind actual changes.
  • Using MA(6), the forecast for the next 4 quarters predicts stable arrivals with mild variations.

Overall, increasing the order of the moving average reduces short-term noise but delays response to sudden changes in the data.

3.3 Simple Smoothing

ss_forecast = ses(arrivals_ts, h = 4)
plot(ss_forecast)

ss_forecast$model
## Simple exponential smoothing 
## 
## Call:
## ses(y = arrivals_ts, h = 4)
## 
##   Smoothing parameters:
##     alpha = 0.3625 
## 
##   Initial states:
##     l = 29.2613 
## 
##   sigma:  11.0571
## 
##      AIC     AICc      BIC 
## 1197.661 1197.861 1206.122

Model Parameters

  • Alpha (α) = 0.3625
    • Alpha is the smoothing parameter that controls how much weight is given to the most recent observation.
    • A value of 0.36 means that approximately 36% of the forecast depends on the latest observation, while 64% depends on past smoothed values.
    • This indicates a moderate level of responsiveness — the model reacts to new changes, but still retains some smoothing from historical data.
  • Initial State (l₀) = 29.2613
    • This represents the model’s starting level estimate at the beginning of the time series.
    • It acts as the baseline from which the exponential smoothing process begins.
  • Sigma (σ) = 11.0571
    • Sigma measures the standard deviation of residuals (forecast errors).
    • A sigma value of around 11.06 suggests that, on average, the forecasts deviate by about 11 units from the observed values.
    • The smaller the sigma, the more consistent and accurate the model fit; a higher sigma indicates higher variability and uncertainty.

Summary

Overall, the SES model with α = 0.36 provides a moderately smoothed forecast, effectively capturing the level of the series without modeling trend or seasonality. The residual variance (σ ≈ 11) shows reasonable stability, making SES suitable for short-term level forecasting.

ss_residual = residuals(ss_forecast)
plot(ss_residual, main="Residuals from Simple Smoothing Forecast", ylab="Residuals", xlab = "Year")
abline(h=0, col="red")

The residuals fluctuate around zero, indicating no systematic bias in the model. However, the repeating peaks and troughs suggest that some seasonal pattern remains in the residuals — meaning the Simple Exponential Smoothing model has not captured the seasonality in the data.

hist(ss_residual, main = "Histogram of SS model Residual", xlab="Residuals", col="lightblue", border="white")

Observations:

  • The residuals appear approximately centered around zero, which indicates that the model does not have a consistent overestimation or underestimation bias.
  • The distribution shows a roughly bell-shaped pattern, though slightly skewed to the right, suggesting a few higher positive residuals.
  • There are no strong outliers or extreme deviations, meaning that most forecast errors are within a reasonable range.

Interpretation:

  • A roughly symmetric residual histogram centered near zero supports the assumption that the SES model captures the general level of the time series adequately.
  • However, the mild asymmetry suggests that the model may not fully capture certain fluctuations or seasonal effects in the data, which is expected since SES does not account for trend or seasonality.
plot(as.numeric(ss_forecast$fitted), as.numeric(ss_forecast$residuals),
     main="Fitted Values vs Residuals (SS Model)",
     xlab="Fitted Values", ylab="Residuals")
abline(h=0, col="red")

The residuals are roughly centered around zero and randomly scattered, suggesting that the SS model adequately captures the general level of the data but may not fully explain trend or seasonal variation.

plot(as.numeric(arrivals_ts), as.numeric(ss_forecast$residuals),
     main = "Actual Values vs Residuals (SS Model)",
     xlab = "Actual Values", ylab = "Residuals",
     type = "p", pch = 19, col = "darkblue", cex= 0.8)
abline(h = 0, col = "red")

The residuals show no clear pattern against actual values, though slightly larger deviations occur at higher levels, suggesting the SS model struggles to capture high-season fluctuations.

Acf(ss_residual, main="ACF of Residuals - SS Forecast")

The ACF plot of residuals still shows significant spikes at seasonal lags (for example, around lag 4, 8, 12, etc.), indicating that seasonal correlation remains in the residuals. This means that the Simple Smoothing (SS) model — which only captures the level of the time series — did not adequately model the seasonal structure present in the data.

accuracy(ss_forecast)
##                    ME     RMSE      MAE      MPE     MAPE     MASE       ACF1
## Training set 1.933619 10.96758 8.848115 1.518322 10.87885 1.186282 0.06230072
ss_forecast
##         Point Forecast     Lo 80    Hi 80    Lo 95    Hi 95
## 2012 Q1       116.1663 101.99601 130.3365 94.49472 137.8378
## 2012 Q2       116.1663 101.09393 131.2386 93.11511 139.2174
## 2012 Q3       116.1663 100.24287 132.0897 91.81353 140.5190
## 2012 Q4       116.1663  99.43505 132.8975 90.57808 141.7545
plot(ss_forecast)

Accuracy summary
  • The SS model shows moderate forecasting accuracy, with an RMSE of 10.97, which is lower (better) than that of the Naïve model.
  • This indicates that the SS model provides a more stable fit and reduces forecast error to some extent.

The forecasted average value for the next year is approximately 116 (thousands), but each quarter shows different lower and upper prediction intervals (Lo/Hi 80–95%), indicating seasonal fluctuation and uncertainty around the mean forecast.

3.4 Holt-Winters

3.4.1 Holt Winters Model perform

hw_forecast = hw(arrivals_ts)
forecast_hw = forecast(hw_forecast, h = 4)
forecast_hw$model
## Holt-Winters' additive method 
## 
## Call:
## hw(y = arrivals_ts)
## 
##   Smoothing parameters:
##     alpha = 0.4975 
##     beta  = 1e-04 
##     gamma = 0.0921 
## 
##   Initial states:
##     l = 27.0003 
##     b = 0.7211 
##     s = 6.6781 -4.2792 -6.5704 4.1715
## 
##   sigma:  7.8361
## 
##      AIC     AICc      BIC 
## 1118.013 1119.592 1143.395
  • The Holt–Winters additive model (α ≈ 0.5, β ≈ 0, γ ≈ 0.09) effectively captures both level and seasonality in the U.S. tourist arrivals data.
  • The very small β shows a relatively stable trend, while the low σ suggests good forecast accuracy.
  • This model is expected to outperform SES and Naïve models for short-term forecasting where trend and seasonal variation exist.

3.4.2 Residual Analysis — Holt–Winters Model

hw_residual = residuals(forecast_hw)

#plot residual
plot(hw_residual, main = "Residuals from Holt Winter Model", ylab="Residuals", xlab = "Year")
abline(h=0, col="red")

#histogram
hist(hw_residual, main = "Histogram of Holt Winter model Residual", xlab="Residuals", col="lightblue", border="white")

# fitted value vs residuals
plot(as.numeric(fitted(forecast_hw)), as.numeric(hw_residual),
     main="Fitted Values vs Residuals (Holt Winter Model)",
     xlab="Fitted Values", ylab="Residuals")
abline(h=0, col="red")

# real values vs residuals
plot(as.numeric(arrivals_ts), as.numeric(hw_residual),
     main = "Actual Values vs Residuals (Holt Winter Model)",
     xlab = "Actual Values", ylab = "Residuals",
     type = "p", pch = 19, col = "darkblue", cex= 0.8)
abline(h = 0, col = "red")

#ACF plot of residuals
Acf(hw_residual, main="ACF of Residuals - Holt Winter Forecast")

Plot of Residuals

The small residual spikes near zero are random and expected, indicating that the Holt-Winters model effectively captures the underlying level, trend, and seasonality. Only a few larger peaks appear, but they do not follow a systematic seasonal pattern and can be considered random noise rather than model bias.

Histogram of Residuals

  • The histogram of residuals appears approximately symmetric and centered near zero, forming a bell-shaped curve.
  • This supports the assumption of normally distributed residuals and suggests that the Holt–Winters model provides unbiased forecasts.
  • A few minor deviations at the tails indicate small outliers, but they are not severe.

Both residual plots (Fitted vs Residuals and Actual vs Residuals) show that:

  • Most residual values lie between -10 and +10, indicating that forecast errors are relatively small.
  • The residuals are roughly symmetric around zero, suggesting that the model does not have a consistent overestimation or underestimation bias.
  • The scatter of residuals appears fairly random without any clear pattern or curvature, confirming that the Holt-Winters model captures most of the systematic structure (level, trend, and seasonality).
  • Only a few points lie outside the ±10 range, representing occasional large deviations, but these are isolated and not systematic.

ACF of Residuals

  • The ACF plot of residuals shows that most autocorrelation values fall within the 95% confidence bands.
  • This implies that residuals are uncorrelated over time, meeting the assumption of independence.
  • It also suggests that the Holt–Winters model has adequately captured the serial dependence in the original time series.

The residual diagnostics confirm that the Holt–Winters additive model provides a good fit for the data.
Residuals resemble white noise — centered around zero, normally distributed, and uncorrelated —
indicating that the model effectively explains both trend and seasonality in the tourist arrivals series.

accuracy(forecast_hw)
##                        ME   RMSE      MAE        MPE     MAPE      MASE
## Training set -0.003138538 7.5791 5.505299 -0.3492589 7.191839 0.7381052
##                   ACF1
## Training set 0.0132075
forecast_hw
##         Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
## 2012 Q1       125.2931 115.25079 135.3355 109.93469 140.6516
## 2012 Q2       108.0475  96.83056 119.2644  90.89269 125.2022
## 2012 Q3       113.8382 101.55815 126.1182  95.05750 132.6189
## 2012 Q4       126.0028 112.74427 139.2614 105.72561 146.2800
plot(forecast_hw)

Model Accuracy

The Holt–Winters additive model achieved strong accuracy metrics: - ME = –0.083, RMSE = 7.58, MAE = 5.51, MAPE = 7.18%, MASE = 0.74, ACF1 = 0.01.
These results indicate that forecast errors are small and unbiased.
The low RMSE and MAPE show the model performs substantially better than simpler methods such as Naïve or SES.
The near-zero ACF1 value confirms that residuals are nearly uncorrelated, which means the model has captured most of the systematic patterns.

Forecast for the Next Year

The model predicts tourist arrivals (in thousands) for 2012 as follows: | Quarter | Forecasted Value | |———-|——————| | Q1 | 125.29 | | Q2 | 108.05 | | Q3 | 113.84 | | Q4 | 126.00 |

Overall, the model expects a temporary dip in mid-year (Q2–Q3) followed by a recovery in Q4 —
a pattern consistent with the seasonal nature of U.S. tourist arrivals.

Additional Observations

  • The forecast pattern maintains the seasonal oscillation observed historically, confirming that the model successfully captures recurring quarterly cycles.
  • The residual diagnostics show no significant autocorrelation or bias, indicating a good model fit.
  • Compared with SES, this model provides more realistic short-term dynamics because it simultaneously accounts for both trend and seasonality.

Conclusion:
The Holt–Winters additive method delivers accurate, stable forecasts and effectively models both trend and seasonal variations,
making it one of the most suitable approaches for this quarterly tourism dataset.

3.5 Decomposition

#decomposition
stl_decomp <- stl(arrivals_ts, s.window = "periodic")
plot(stl_decomp)

attributes(stl_decomp)
## $names
## [1] "time.series" "weights"     "call"        "win"         "deg"        
## [6] "jump"        "inner"       "outer"      
## 
## $class
## [1] "stl"
  • The “seasonal” panel shows a strong, repeating quarterly pattern that remains stable across time. This confirms the presence of consistent seasonal variation every year.

  • The seasonal amplitude remains roughly constant over time — it does not increase with the overall level of the series. Hence, an additive model is appropriate.

  • Q1: below average tourist arrivals, Q2 sightly below average, Q3 peack tourist(summer), Q4 Above average(holiday season)

The STL decomposition clearly indicates a strong additive seasonal pattern with consistent quarterly peaks and troughs. Tourist arrivals peak in Q3 due to summer travel demand and drop in Q1 due to winter seasonality. The trend component shows a long-term upward growth, reflecting the overall increase in tourism over the decades.

# seasonal adjustment
seasadj(stl_decomp)
##           Qtr1      Qtr2      Qtr3      Qtr4
## 1981  25.84092  31.65618  29.13991  27.37099
## 1982  27.05192  36.30118  35.46291  27.22599
## 1983  25.99692  40.30418  42.08291  32.04499
## 1984  36.07792  49.21218  37.66291  38.40499
## 1985  41.31692  51.00518  45.72291  59.36099
## 1986  52.90192  61.21818  53.11691  79.81099
## 1987  77.65492  72.28218  70.58291  90.09799
## 1988  72.13392  77.24118  90.87391  86.67099
## 1989  66.84592  67.75418  65.47491  61.69399
## 1990  61.16192  64.08818  63.38991  62.95099
## 1991  56.75992  64.62618  90.07991  60.26499
## 1992  63.38592  69.43618  62.47591  67.56499
## 1993  69.84792  70.65318  69.33791  71.42799
## 1994  74.95292  71.48918  71.11191  72.12299
## 1995  76.57192  75.38318  73.84091  79.09299
## 1996  78.51992  78.94018  76.01891  83.41599
## 1997  80.92992  80.99618  84.59291  83.07399
## 1998  94.69192  93.31918  87.19291  98.69299
## 1999 107.38492 100.42918  99.60791 109.63699
## 2000 112.36492 115.24818 136.43991 124.03299
## 2001 120.22692 121.40618 109.84591  94.99099
## 2002 114.65192 104.95118 106.86791 108.03599
## 2003 106.43292 102.14718 107.56791 105.97199
## 2004 107.54192 109.00018 110.49091 106.26799
## 2005 116.24992 112.47718 112.33191 105.21899
## 2006 117.13292 117.80018 112.39391 108.75699
## 2007 118.20192 116.02418 112.34491 113.12199
## 2008 121.59092 107.40618 115.84991 109.56899
## 2009 112.68292 113.44218 123.58591 130.02699
## 2010 121.05992 113.90918 116.22191 120.93499
## 2011 118.78892 109.74918 106.53191 121.08299
# Plot a line on the graph
plot(arrivals_ts)
lines(seasadj(stl_decomp), col="Red")

The seasonally adjusted series (red line) smooths out the periodic ups and downs seen in the original data, confirming that the time series exhibits strong and consistent seasonal fluctuations that significantly affect the short-term values of tourist arrivals.

3.6 Accuracy Summary

# Create accuracy comparison table
model_comparison <- rbind(
  naive = accuracy(naive_forecast),
  ss = accuracy(ss_forecast),
  ma = accuracy(forecast_ma6),
  hw = accuracy(forecast_hw)
)

rownames(model_comparison) <- c( "Naive",  "SES", "Moving Average","Holt Winter")
round(model_comparison[, c("RMSE","MAE","MAPE","MASE")], 3)
##                  RMSE   MAE   MAPE  MASE
## Naive          12.479 9.706 11.912 1.301
## SES            10.968 8.848 10.879 1.186
## Moving Average  0.966 0.736  0.922 0.136
## Holt Winter     7.579 5.505  7.192 0.738

In this project, MASE was selected as the primary accuracy measure, as it provides a standardized comparison across models. Based on MASE, the Moving Average model achieved the best performance (MASE = 0.136), indicating that it significantly outperformed the Naïve benchmark and provided the most accurate forecasts.

Best and Worst Models by Metric

Accuracy Measure Best Model Worst Model Interpretation
MASE Moving Average Naïve MA performs best on a scaled basis, far outperforming all others.

4. Summary Interpretation

Overall

  • Over the entire period (1981 – 2012), the tourist arrivals time series shows a clear upward trend, reflecting long-term growth in international travel to the U.S.
  • The decomposition analysis reveals strong, stable seasonality — arrivals peak around Q3–Q4 (summer and holiday seasons) and drop in Q1 (winter months).
  • The seasonally adjusted series continues to trend upward, confirming steady underlying growth despite short-term cyclical dips.

Based on the selected models and forecasts:

  • Next Year (1-year horizon): The forecast suggests a slight increase in tourist arrivals, continuing the upward trajectory observed historically.
  • Next Two Years: The trend is expected to continue increasing gradually, though at a slower and more stabilized rate, indicating a mature and cyclic tourism market. Seasonal effects are expected to persist, with similar quarterly highs and lows repeating in the future.

Ranking of Forecasting Methods (Based on Historical Accuracy)

  1. Moving Average: Lowest overall errors (RMSE = 0.966, MAPE = 0.922%), indicating best short-term predictive accuracy.

  2. Holt-Winters:Effectively models both trend and seasonality; slightly higher error due to model complexity.

  3. Simple Exponential Smoothing (SES):Captures short-term level changes but lacks seasonality adjustment.

  4. Naïve Model:Baseline for comparison; highest error across all metrics.